A Database Interface for Clustering in Large Spatial Databases

نویسندگان

  • Martin Ester
  • Hans-Peter Kriegel
  • Xiaowei Xu
چکیده

Both the number and the size of spatial databases are rapidly growing because of the large amount of data obtained from satellite images, X-ray crystallography or other scientific equipment. Therefore, automated knowledge discovery becomes more and more important in spatial databases. So far, most of the methods for knowledge discovery in databases (KDD) have been based on relational database systems. In this paper, we address the task of class identification in spatial databases using clustering techniques. We present an interface to the database management system (DBMS), which is crucial for the efficiency of KDD on large databases. This interface is based on a spatial access method, the R*-tree. It clusters the objects according to their spatial neighborhood and supports efficient processing of spatial queries. Furthermore, we propose a method for spatial data sampling as part of the focusing component, significantly reducing the number of objects to be clustered. Thus, we achieve a considerable speed-up for clustering in large databases. We have applied the proposed techniques to real data from a large protein database used for predicting protein-protein docking. A performance evaluation on this database indicates that clustering on large spatial databases can be performed both efficiently and effectively using our approach.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Database Interface for Clustering in Large Spatial Databases1

Both the number and the size of spatial databases are rapidly growing because of the large amount of data obtained from satellite images, X-ray crystallography or other scientific equipment. Therefore, automated knowledge discovery becomes more and more important in spatial databases. So far, most of the methods for knowledge discovery in databases (KDD) have been based on relational database s...

متن کامل

PFDC: A Parallel Algorithm for Fast Density-based Clustering in Large Spatial Databases

Clustering – the grouping of objects depending on their spatial proximity – is one important technique of knowledge discovery in spatial databases. One of the proposed algorithms for this is FDC [5], which uses a density-based clustering approach. Since there is a need for parallel processing in very large databases to distribute resource allocation, this paper presents PFDC, a parallel version...

متن کامل

Knowledge Discovery in Large Spatial Databases: Focusing Techniques for Efficient Class Identification

Both, the number and the size of spatial databases are rapidly growing because of the large amount of data obtained from satellite images, X-ray crystallography or other scientific equipment. Therefore, automated knowledge discovery becomes more and more important in spatial databases. So far, most of the methods for knowledge discovery in databases (KDD) have been based on relational database ...

متن کامل

Clustering and Knowledge Discovery in Spatial Databases

In the past decades, clustering has been widely used in areas such as pattern recognition, data analysis, and image processing. Recently, clustering has been recognized as a useful method for knowledge discovery in spatial databases. To eeciently detect clusters from large spatial databases with limited amount of available memory, special database techniques have been developed. In this article...

متن کامل

D-GridMST: Clustering Large Distributed Spatial Databases

In this paper, we will propose a distributable clustering algorithm, called Distributed-GridMST (D-GridMST), which deals with large distributed spatial databases. D-GridMST employs the notions of multi-dimensional cube to partition the data space involved and uses density criteria to extract representative points from spatial databases, based on which a global MST of representatives is construc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1995